Reduced size objects headers

ABSTRACT

A method and apparatus for reducing memory requirements in a computing environment. The method includes reducing the size of a header for a data structure by creating a header consisting of index information. Alternatively, the header may also include garbage collection information. The invention also provides a data structure for an object-oriented programming environment. The data structure includes: 1) a header consisting of index information and 2) one or more fields. Unlike prior data structures the header does not include information regarding the data structure&#39;s size; where it references are; it dispatch table; hash code information; or monitor information.

This application claims the benefit of Provisional application Ser. No.60/180,553, filed Feb. 7, 2000.

BACKGROUND OF THE INVENTION

The present invention relates to reducing memory requirements oroverhead in a computing environment. More specifically, the inventionrelates to reducing the size of object headers, thereby reducing theamount of memory required to store the objects of a program.

Object-oriented programming involves defining objects or, more broadly,abstractions of items that need to be manipulated or processed in orderto solve the problem or meet the objective addressed by the program.Objects are a type of data structure that include defined information orattributes. Objects can also be manipulated in a manner similar tovariables (such as integer, Boolean, and floating point variables) inprocedural programming languages. However, the types of operations(functions or methods) that may be performed on an object are defined bythe programmer. In addition, programmers can create relationshipsbetween one object and another. For example, objects can inheritcharacteristics from other objects.

An object is an instance of a class. A class is a specification ortemplate that defines the characteristics (attributes and methods) forall objects of that class type. One of the principal advantages ofobject-oriented programming techniques over procedural programmingtechniques is that they enable programmers to create modules that do notneed to be changed when a new type of object is added. A programmer cansimply create a new object that inherits many of its features fromexisting objects. This makes object-oriented programs easier to modify.

There are several object-oriented programming languages including C++and Java. Before it is executed, Java source code is usually translatedor compiled into byte code by a Java compiler. The byte code is theninterpreted or converted to machine language at run time. Java can beimplemented as an interpreted language, meaning programs written in Javacan be run using an interpreter. An interpreter decodes and runs aprogram at the same time. Specifically, the interpreter decodes one lineof programming, executes that line of code, and then proceeds to thenext line of code.

The Java Virtual Machine (“VM”) carries out the task of interpreting orotherwise executing the Java byte code. Java VMs are present in mostbrowsers and widely licensed for use in a variety of computing devices.With most other programming languages, different versions of a programmust be developed for different computer environments. Further, Javaprograms can be stored in relatively small files, which is important inapplications where memory is limited (e.g., when running software oncell phones, personal digital assistants, and the like) and makestransmitting the programs over networks easier and faster.

While it is possible to create a computing environment specificallydesigned for Java (e.g., by using a Java chip), most Java platforms aredeployed on top of a non-Java host environment that employs a standardprocessor with a Java VM installed in memory. A Java platform is aprogramming environment that includes the Java VM and the Javaapplication programming interface (“API”). The Java API consists of aset of predefined classes.

Most Java VMs use two 32-bit words at the front of each object as a“header.” The header is used to provide the VM and the garbage collector(a routine that reclaims memory occupied by program segments that are nolonger active) certain information about every object in the programbeing executed. This information includes the object's class, size, anddispatch table (used to call virtual methods). The header also providesinformation regarding the object's references, the bits (indicatingcolor) used by the garbage collector, hash code information, and monitorinformation (thread synchronization activity).

SUMMARY OF THE INVENTION

In Java and other object-oriented programming languages even the mosttrivial programs can have thousands of objects. As noted, each object isassigned a header, typically of 64-bits or more. Thus, even a smallprogram can require many tens of Kbytes of memory to run, and asignificant portion of this may be object headers. While on manycomputing platforms memory requirements are not of concern, devices suchas personal digital assistants, cell phones, and the like haverelatively small memory footprints, which limit their ability to runmany programs. Accordingly, it would be beneficial to reduce the size ofobject headers to reduce memory demands so that programs could run on avariety of platforms, including those with limited memory.

In one embodiment, the invention provides a method of reducing memoryrequirements in a computer environment. The method includes creating anobject header for an object consisting of index information and garbagecollection information. The index information references, at leastindirectly, class information for the object. In one form of theinvention, the method involves creating a global index of dispatchtables, and creating a dispatch table that is referenced by the globalindex. In this form, the index information references class informationthrough the global index and the dispatch table.

The header has a finite number of bits. In one embodiment, the garbagecollection information is stored in a predefined number of the bottommost bits of the header, and the index information is stored in theremaining bits of the header. Alternatively, a predefined number of thebottom bits are masked and the garbage collector is instructed to usethe masked bits for garbage collection.

The invention may be implemented in a computing environment thatincludes a data structure having a header. The header consists of indexinformation and garbage collection information. The computingenvironment also includes a global index of dispatch tables referencedby the data structure, a dispatch table referenced by the global index,and class data referenced by the dispatch table. Preferably, the headerhas a size and the size is equal to or less than one 32-bit word. Whenso sized, the garbage collection information is stored in a predefinednumber of the lower most bits of the 32-bit word and the indexinformation is stored in the remaining bits. As noted above, storage ofinformation for the garbage collector may also be implemented usingmasking.

The invention also provides a data structure for an object-orientedprogramming environment. The data structure includes 1) a headerconsisting of index information and 2) one or more fields. Unlike priordata structures the header does not include information regarding thedata structure's size; where its references are; its dispatch table(except for one embodiment of the invention where the header doescontain a dispatch table pointer); garbage collection information; hashcode information; or monitor information.

As is apparent from the above, it is an advantage of the invention toprovide a method for reducing and an apparatus with reduced datastructure header size. Other features and advantages of the presentinvention will become apparent by consideration of the detaileddescription and accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a programming object.

FIG. 2 is a schematic diagram of a programming array.

FIG. 3 is a schematic diagram of a dispatch table referenced by anobject or an array according to the prior art.

FIG. 4 is a schematic diagram of a dispatch table referenced by anobject or array according to one embodiment of the invention.

FIG. 5 is a schematic diagram of a garbage collection data structure.

FIG. 6 is a schematic diagram of a dispatch table referenced by anobject or array according to another embodiment of the invention.

FIG. 7 is a schematic diagram of a garbage collection process performedon a set of objects with reduced size headers.

FIG. 8 is a schematic diagram illustrating one mechanism for storinghash codes.

FIG. 9 is a schematic diagram illustrating another mechanism for storingmonitor information.

DETAILED DESCRIPTION

Before embodiments of the invention are explained, it is to beunderstood that the invention is not limited in its application to thedetails of the construction and the arrangements of the components setforth in the following description or illustrated in the drawings. Theinvention is capable of other embodiments and of being practiced orbeing carried out in various ways. Also, it is to be understood that thephraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. The followingdescription assumes that the reader is familiar with computer scienceand has a working knowledge of object-oriented and multi-threadedprogramming languages, as one of ordinary skill in the art wouldpossess.

FIGS. 1 and 2 schematically illustrate common programming elements. FIG.1 illustrates the layout of a typical object 10 generated using anobject-oriented programming language. The object 10 includes an objectreference 12, a header including an information word 14 and a dispatchfield 16, and a number of object fields 18. FIG. 2 illustrates thearchitecture of a typical array 25. The array 25 includes an arrayreference 27, an information word 29, a dispatch pointer 31, an arraylength field 33, and array elements shown as array data 35.

The object 10 and array 25 (generically, data structures) haveinformation words. As noted above, a header or information word providesthe VM and the garbage collector certain information about the object orarray, as the case may be. For purposes of brevity, the description thatfollows will focus on objects, but it should be understood that theprinciples set forth herein could be applied to arrays and other datastructures. In most known implementations, the header (information word14 and dispatch field 16) is at least 64 bits in size and allows thefollowing information to be derived:

A. the object's class;

B. the object's size;

C. where the object's references are;

D. the object's dispatch table (for calling virtual methods);

E. garbage collection information (usually color);

F. hash code information; and

G. monitor information.

As noted above the overhead associated with the information words foreach class and array in a program can be substantial. The inventor hasfound that the size of the information word 14 can be reduced bycombining information regarding the object's class, size, references,and dispatch table, because these information elements change on aclass-by-class basis rather than on a per-object basis. Garbagecollection information is needed on a per-object basis. Although thisinformation can be stored separately from the object in order to reducethe size of the information word 14, in one embodiment of the invention,garbage collection information is maintained with its subject object.This is done because, in most cases, moving the garbage collectioninformation to separate storage merely shifts the memory burden toanother location.

The inventor has also found that hash code information can be removedfrom the information word 14, because in most instances the hash codecan be derived from the object's address in memory. However, if theaddress of the object 10 is changed then the object's derived hash codechanges. Thus, in one embodiment of the invention, hash code informationis removed from the information word 14 only in those cases where theobject's address remains unchanged. Alternatively, the hashing algorithmis modified so that is generates the same hash value regardless ofwhether the algorithm uses a new address or a previous address for theobject.

The inventor has also found that monitor information (informationconcerning thread synchronization) is needed infrequently (i.e.,relatively few objects are ever used for synchronization and even fewerobjects are synchronized concurrently). Thus, the size of the objectheader (information word 14 and dispatch field 16) is reduced further bymoving the monitor information into an independent data structureindexed by the object's address.

When the optimizations explained above are implemented, only two dataitems need to be stored in the information word 14: 1) a class specificpointer/index and 2) garbage collection information. With only two itemsto store in the information word 14, its size can be reduced to single32-bit word. Preferably, the bottom n-bits in the information word 14are used for the garbage collection information and the remaining bitsare used for class information.

Additional details regarding how to obtain the optimizations noted aboveare best explained by reference to FIGS. 3, 4, and 5. FIG. 3 illustratesan element 40 (a data structure such as an object or an array) having anelement reference 42, an information word 44, and a dispatch pointer 46.The dispatch pointer 46 references a dispatch table 48 having aplurality of slots 50. The plurality of slots 50 includes a zero slot52, a first slot 54, a second slot 56, and a third slot 58. The slot 52contains a class pointer and the slots 54-58 contain method data. Thedispatch pointer 46 provides a mechanism for determining the class ofthe element 40. The dispatch pointer refers to the class pointer in slot52 of the dispatch table 48. The class pointer refers to a class element60. The dispatch table 48 provides a mechanism to implement virtualmethods for different classes of objects, such as a class and one ormore of its subclasses. The method data in the dispatch table can bemodified to support additional or modified methods needed in objects ofa subclass without the need to modify the basic structure of the objectsthemselves. In this way, objects of a class and objects of a subclass ofthat class retain a similar architecture.

FIG. 4 illustrates one embodiment of the invention. In particular, FIG.4 illustrates an object 75 having a header 77. The header 77 includesindex information 79 and garbage collection information 81. The indexinformation 79 provides a reference to a global index 85. The globalindex 85 provides an index of dispatch tables for all objects having areduced size header. Each entry or element of the global index 85provides a reference to an appropriate dispatch table (in this casedispatch table 90) for the object at hand. The dispatch table 90includes a plurality of pointers 92 that reference the appropriate classinformation for the object at hand. In this case, the pointer 92associated with the object 75 references a class element 94.

For the embodiment shown in FIG. 4, the bit size of the indexinformation 79 is limited to the entire bit size of the header 77 andreduced by the number of bits used to store the garbage collectioninformation 81. As noted above, a reduced size header could have a sizeof 32 bits. If garbage collection information uses two or three bits ofthe header only 29 or 30 bits are available for the index information79. By having the index information 79 reference a global index withreferences of full size (in this case 32 bits), dispatch tables can beaccessed easily.

FIG. 5 illustrates a garbage collection data structure 100, that storesgarbage collection information for all objects in a program. The datastructure 100 may be used in an alternative embodiment of the invention,where instead of maintaining the garbage collection information (such asthe information 81) in the object's header (such as the header 77), thegarbage collection for all objects is placed in the data structure 100.The data structure 100 is normally accessed by the garbage collectorusing the address of an object as an index. Whether the invention isimplemented with a separate garbage collection data structure willdepend on various factors including the exact memory reclamationalgorithm used by the garbage collector.

FIG. 6 illustrates yet another embodiment of the invention. In theembodiment shown, an element 110 (object or array) having a reference112 includes a header 114 having a dispatch pointer 116. The dispatchpointer 116 references a dispatch table 118 having a plurality of slots120. The plurality of zero slots 120 includes a zero slot 122, a firstslot 124, a second slot 126, and a third slot 128. The slot 122 containsa class pointer and the slots 124-128 contain method data. The classpointer refers to a class element 130. The embodiment shown in FIG. 6does not include a global index or a data structure for garbagecollection information. Rather, a certain number of bottom bits(preferably 3 or less) in the dispatch pointer 116 are masked and theaddresses are aligned to match the masking. For example, if the twobottom bits of the dispatch pointers of the objects are masked, then thebottom two bits of the address in each dispatch table would be set tozero.

The bottom bits of the dispatch pointer 116 are masked in such a way toprovide garbage collection information to the garbage collector. As isknown, garbage collection (“GC”) can be implemented in a variety ofways. Concurrent GC occurs while the program executes. Pausing GC causesthe execution of the program to stop while garbage collection occurs.Whether GC occurs concurrently or not, various methods may be used tocarry out the task of reclaiming memory. One common method is known as“mark and sweep” GC. The general principles of mark and sweep GC arewell known and are not discussed herein. However, for the maskingembodiment of the invention, it is assumed that GC is carried out with apausing garbage collector using a mark and sweep methodology.

As noted, the bottom bits of the dispatch pointer are masked and alignedto the addresses in the dispatch table 118. During pausing GC, all thethreads (or execution paths) of the program at hand are stopped. Thegarbage collector examines the objects in the program, such as theelement 110, and while the garbage collector is carrying out its tasksno other elements access the objects. Because no other elements need toor can access the dispatch pointer 116 during GC, the garbage collectormay use the bottom bits of the dispatch pointer 116 for markingpurposes. When the garbage collector completes its tasks, it returnsthose bits to zero before restarting the other threads of the program.

FIG. 7 illustrates, in a general manner, a typical GC process for a setof objects 145 of a program. As is known, GC is ordinarily accomplishedby defining a set of roots (often a “root set” of pointers into all theobjects of the program) and then determining reachability from theroots. In FIG. 7, a root set 150 is shown with a pointer or reference152 to an object 154. The object 154 has a reference 156 to an object158. The object 158 has a reference 160 to an object 162 and the object162 has a reference 164 back to the object 158. The set of objects 145also includes an object 170 having a reference 172 to another object174, but the objects 170 and 174 are unreachable from the root set 150.

The garbage collector analyzes each object in a tracing process and setsone of the bottom bits to indicate that the object has been visited bythe garbage collector. In other words, the garbage collector marks theobject (in a three three-color marking system the object is colored grayor black). The objects 170 and 174 can not be reached and, therefore,are not marked (they remain white). Once the garbage collector hastraced through all of the objects, it sweeps the heap removing thoseobjects that are unmarked (white). During the sweeping phase, thegarbage collector changes the bottom bits of the dispatch pointer 116back to zeroes so that execution of the program can continue normallyonce garbage collection is complete. All that is required to accomplishthis is to add an extra assignment instruction to the garbage collectorper object.

FIG. 8 illustrates how hash code and monitor information is determinedwhen using a reduced sized object header. FIG. 8 illustrates an object200 having a header 202 with a dispatch pointer 204. The address in theheader represents the location of the object 202. For example, theheader may, in hexadecimal format, have an address of 0x2345678. A setof bits, preferably, the lower order bits are used as a hash codeindicator to reference a hash table 210. The hash table 210 contains amapping of object addresses. For example, the hash table 210 has a slot212 with an address 0x678. Alternatively, hash code information may bestored in an array (not shown) of linked lists, such as an exemplarylinked list 225 (shown in FIG. 9). When an array is used, the hash index(e.g., 0x678) is used to find the appropriate linked list in the array.The linked list (e.g., linked list 225) is mapped to the objects in theprogram. Similar data structures (not shown) can be used to storemonitor information.

Although the use of a table or array of linked lists to store hash codeand monitor information impacts memory overhead, in many instances hashcode and monitor information is not needed. Accordingly, hash code andmonitor information is maintained only for those objects where it usefulor needed. Thus, the memory overhead associated with the hash table orarray of linked lists is relatively small.

One way of optimizing the use of a hash table or linked list for hashcode and monitor information is to use local thread cache to store therelevant information. This helps maintain data integrity in amulti-threaded environment.

The use of one-word headers (when implemented in accordance to theembodiment shown in FIG. 4) impacts the performance of variousoperations. In particular, sweeping the heap requires more indirectionto determine the size of each object, and where its references are.Thus, sweeping is slower. Acquiring or releasing monitors is generallyslower than having the monitor information in the object header.Accessing class information is also slower because of added indirection.Finally, the invocation of virtual and interface methods requires moreindirection to de-reference the dispatch table, and therefore, isslower. The inventor has found that the invocation of virtual andinterface methods has the biggest impact on performance. Accessing thedispatch table pointer, when the class-field is a table offset, requiresadditional instructions to be executed. For example, in x86 assemblylanguage code, to acquire the dispatch table pointer for an invoke is nolonger a single move, rather the invocation is:MOV EAX, [ECX]//Get header wordAND EAX, CLASS_FIELD_MASKMOV EAX, [EAX+DISPATCH_TABLE_BASE]Nevertheless, the savings in memory, in some environments, offsets anyperformance degradation.

As can be seen from the above, the invention provides, among otherthings, a method and apparatus that reduce object header size. Variousfeatures and advantages of the invention are set forth in the followingclaims.

1. A computing environment, comprising: a plurality of objects eachhaving a header with header address information therein andcorresponding object class information, object size information andmonitor information located outside the plurality of objects; theheaders having 32 or fewer bits of header information including onlygarbage collection information and header index information, wherein thegarbage collection information is stored in a predefined number N ofbits of the headers and the header index information is stored in theremaining 32−N or fewer bits of the headers; header index to classinformation pointers for the plurality of objects to permit accessingthe object class information in accordance with the index information; afirst data structure for storing and providing a global index of theheader index to class information pointers wherein the index informationof a selected header within a selected object provides a global indexaddress of 32−N or fewer bits for addressing a header index to classinformation pointer to provide a referenced header index to classinformation pointer; a second data structure separate from the firstdata structure for providing a dispatch table having slots for storingclass pointers wherein a selected class pointer is addressed by thereferenced header index to class information pointer to provide areferenced class pointer, the referenced class pointer indicating alocation in a third data structure separate from the first and seconddata structures containing the corresponding object class information ofthe selected object whereby the corresponding object class informationof the selected object is indirectly addressed by the 32−N or fewer bitsof the header within the object by way of the global index and by way ofthe dispatch table; and a fourth data structure for storing the headeraddress information of the plurality of objects wherein the monitorinformation of the selected object in obtained in accordance with amatch between a hash code representation of the address of the selectedheader and the header address information within said fourth datastructure.
 2. A computing environment as claimed in claim 1, furthercomprising a table of garbage collection information mapped to theobject.
 3. A computing environment as claimed in claim 1, wherein theglobal index is addressed with an address equal to the size of theheader.